17 research outputs found

    Accounting for variance and hyperparameter optimization in machine learning benchmarks

    Full text link
    La récente révolution de l'apprentissage automatique s'est fortement appuyée sur l'utilisation de bancs de test standardisés. Ces derniers sont au centre de la méthodologie scientifique en apprentissage automatique, fournissant des cibles et mesures indéniables des améliorations des algorithmes d'apprentissage. Ils ne garantissent cependant pas la validité des résultats ce qui implique que certaines conclusions scientifiques sur les avancées en intelligence artificielle peuvent s'avérer erronées. Nous abordons cette question dans cette thèse en soulevant d'abord la problématique (Chapitre 5), que nous étudions ensuite plus en profondeur pour apporter des solutions (Chapitre 6) et finalement developpons un nouvel outil afin d'amélioration la méthodologie des chercheurs (Chapitre 7). Dans le premier article, chapitre 5, nous démontrons la problématique de la reproductibilité pour des bancs de test stables et consensuels, impliquant que ces problèmes sont endémiques aussi à de grands ensembles d'applications en apprentissage automatique possiblement moins stable et moins consensuels. Dans cet article, nous mettons en évidence l'impact important de la stochasticité des bancs de test, et ce même pour les plus stables tels que la classification d'images. Nous soutenons d'après ces résultats que les solutions doivent tenir compte de cette stochasticité pour améliorer la reproductibilité des bancs de test. Dans le deuxième article, chapitre 6, nous étudions les différentes sources de variation typiques aux bancs de test en apprentissage automatique, mesurons l'effet de ces variations sur les méthodes de comparaison d'algorithmes et fournissons des recommandations sur la base de nos résultats. Une contribution importante de ce travail est la mesure de la fiabilité d'estimateurs peu coûteux à calculer mais biaisés servant à estimer la performance moyenne des algorithmes. Tel qu'expliqué dans l'article, un estimateur idéal implique plusieurs exécution d'optimisation d'hyperparamètres ce qui le rend trop coûteux à calculer. La plupart des chercheurs doivent donc recourir à l'alternative biaisée, mais nous ne savions pas jusqu'à présent la magnitude de la dégradation de cet estimateur. Sur la base de nos résultats, nous fournissons des recommandations pour la comparison d'algorithmes sur des bancs de test avec des budgets de calculs limités. Premièrement, les sources de variations devraient être randomisé autant que possible. Deuxièmement, la randomization devrait inclure le partitionnement aléatoire des données pour les ensembles d'entraînement, de validation et de test, qui s'avère être la plus importante des sources de variance. Troisièmement, des tests statistiques tel que la version du Mann-Withney U-test présenté dans notre article devrait être utilisé plutôt que des comparisons sur la simple base de moyennes afin de prendre en considération l'incertitude des mesures de performance. Dans le chapitre 7, nous présentons un cadriciel d'optimisation d'hyperparamètres développé avec principal objectif de favoriser les bonnes pratiques d'optimisation des hyperparamètres. Le cadriciel est conçu de façon à privilégier une interface simple et intuitive adaptée aux habitudes de travail des chercheurs en apprentissage automatique. Il inclut un nouveau système de versionnage d'expériences afin d'aider les chercheurs à organiser leurs itérations expérimentales et tirer profit des résultats antérieurs pour augmenter l'efficacité de l'optimisation des hyperparamètres. L'optimisation des hyperparamètres joue un rôle important dans les bancs de test, les hyperparamètres étant un facteur confondant significatif. Fournir aux chercheurs un instrument afin de bien contrôler ces facteurs confondants est complémentaire aux recommandations pour tenir compte des sources de variation dans le chapitre 6. Nos recommendations et l'outil pour l'optimisation d'hyperparametre offre une base solide pour une méthodologie robuste et fiable.The recent revolution in machine learning has been strongly based on the use of standardized benchmarks. Providing clear target metrics and undeniable measures of improvements of learning algorithms, they are at the center of the scientific methodology in machine learning. They do not ensure validity of results however, therefore some scientific conclusions based on flawed methodology may prove to be wrong. In this thesis we address this question by first raising the issue (Chapter 5), then we study it to find solutions and recommendations (Chapter 6) and build tools to help improve the methodology of researchers (Chapter 7). In first article, Chapter 5, we demonstrate the issue of reproducibility in stable and consensual benchmarks, implying that these issues are endemic to a large ensemble of machine learning applications that are possibly less stable or less consensual. We raise awareness of the important impact of stochasticity even in stable image classification tasks and contend that solutions for reproducible benchmarks should account for this stochasticity. In second article, Chapter 6, we study the different sources of variation that are typical in machine learning benchmarks, measure their effect on comparison methods to benchmark algorithms and provide recommendations based on our results. One important contribution of this work is that we measure the reliability of a cheaper but biased estimator for the average performance of algorithms. As explained in the article, an ideal estimator involving multiple rounds of hyperparameter optimization is too computationally expensive. Most researchers must resort to use the biased alternative, but it has been unknown until now how serious a degradation of the quality of estimation this leads to. Our investigations provides guidelines for benchmarks on practical budgets. First, as many sources of variations as possible should be randomized. Second, the partitioning of data in training, validation and test sets should be randomized as well, since this is the most important source of variation. Finally, statistical tests should be used instead of ad-hoc average comparisons so that the uncertainty of performance estimation can be accounted for when comparing machine learning algorithms. In Chapter 7, we present a framework for hyperparameter optimization that has been developed with the main goal of encouraging best practices for hyperparameter optimization. The framework is designed to favor a simple and intuitive interface adapted to the workflow of machine learning researchers. It includes a new version control system for experiments to help researchers organize their rounds of experimentations and leverage prior results for more efficient hyperparameter optimization. Hyperparameter optimization plays an important role in benchmarking, with the effect of hyperparameters being a serious confounding factor. Providing an instrument for researchers to properly control this confounding factor is complementary to our guidelines to account for sources of variation in Chapter 7. Our recommendations together with our tool for hyperparameter optimization provides a solid basis for a reliable methodology in machine learning benchmarks

    EmoNets: Multimodal deep learning approaches for emotion recognition in video

    Full text link
    The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based "bag-of-mouths" model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 EmotiW challenge and achieved a test set accuracy of 47.67% on the 2014 dataset

    Bradykinin receptors : agonists, antagonists, expression, signaling and adaptation to sustained stimulation

    Get PDF
    Bradykinin-related peptides, the kinins, are blood-derived peptides that stimulate 2 G protein–coupled receptors, the B1 and B2 receptors (B1R, B2R). The pharmacologic and molecular identities of these 2 receptor subtypes will be succinctly reviewed, with emphasis on drug development, receptor expression, signaling, and adaptation to persistent stimulation. Peptide and nonpeptide antagonists and fluorescent ligands have been produced for each receptor. The B2R is widely and constitutively expressed in mammalian tissues, whereas the B1R is mostly inducible under the effect of cytokines during infection and immunopathology. Both receptor subtypes mediate the vascular aspects of inflammation (vasodilation, edema formation). On this basis, icatibant, a peptide antagonist of the B2R, is approved in the management of hereditary angioedema attacks. Other clinical applications are still elusive despite the maturity of the medicinal chemistry efforts applied to kinin receptors. While both receptor subtypes are mainly coupled to the Gq protein and related second messengers, the B2R is temporarily desensitized by a cycle of phosphorylation/endocytosis followed by recycling, whereas the nonphosphorylable B1R is relatively resistant to desensitization and translocated to caveolae on activation

    Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020

    No full text
    How do machine-learning researchers run their empirical validation? In the context of a push for improved reproducibility and benchmarking, this question is important to develop new tools for model comparison. This document summarizes a simple survey about experimental procedures, sent to authors of published papers at two leading conferences, NeurIPS 2019 and ICLR 2020. It gives a simple picture of how hyper-parameters are set, how many baselines and datasets are included, or how seeds are used

    Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020

    No full text
    How do machine-learning researchers run their empirical validation? In the context of a push for improved reproducibility and benchmarking, this question is important to develop new tools for model comparison. This document summarizes a simple survey about experimental procedures, sent to authors of published papers at two leading conferences, NeurIPS 2019 and ICLR 2020. It gives a simple picture of how hyper-parameters are set, how many baselines and datasets are included, or how seeds are used

    Autophagic flux inhibition and lysosomogenesis ensuing cellular capture and retention of the cationic drug quinacrine in murine models

    No full text
    The proton pump vacuolar (V)-ATPase is the driving force that mediates the concentration of cationic drugs (weak bases) in the late endosome-lysosome continuum; secondary cell reactions include the protracted transformation of enlarged vacuoles into autophagosomes. We used the inherently fluorescent tertiary amine quinacrine in murine models to further assess the accumulation and signaling associated with cation trapping. Primary fibroblasts concentrate quinacrine ∼5,000-fold from their culture medium (KM 9.8 µM; transport studies). The drug is present in perinuclear granules that are mostly positive for Rab7 and LAMP1 (microscopy). Both drug uptake and retention are extensively inhibited by treatments with the V-ATPase inhibitor bafilomycin A1. The H+ ionophore monensin also prevented quinacrine concentration by fibroblasts. However, inhibition of plasma membrane transporters or of the autophagic process with spautin-1 did not alter quinacrine transport parameters. Ancillary experiments did not support that low micromolar concentrations of quinacrine are substrates for organic cation transporters-1 to -3 or P-glycoprotein. The secondary autophagy induced by quinacrine in cells may derive from the accumulation of incompetent autophagolysosomes, as judged from the accumulation of p62/SQSTM1 and LC3 II (immunoblots). Accordingly, protracted lysosomogenesis is evidenced by increased expression of LAMP1 and LAMP2 in quinacrine-treated fibroblasts (48 h, immunoblots), a response that follows the nuclear translocation of the lysosomal genesis transcription factor TFEB and upregulation of LAMP1 and −2 mRNAs (24 h). Quinacrine administration to live mice evidenced variable distribution to various organs and heterogeneous accumulation within the lung (stereo-microscopy, extraction). Dose-dependent in vivo autophagic and lysosomal accumulation was observed in the lung (immunoblots). No evidence has been found for transport or extrusion mechanisms modulating the cellular uptake of micromolar quinacrine at the plasma membrane level. As shown in vitro and in vivo, V-ATPase-mediated cation sequestration is associated, above a certain threshold, to autophagic flux inhibition and feed-back lysosomogenesis

    Rapport par M. de Bouthillier sur la discipline intérieure des corps, lors de la séance du 14 septembre 1790

    No full text
    Bureaux de Pusy Jean-Xavier, Bouthillier de Beaujeu Charles-Léon, marquis de. Rapport par M. de Bouthillier sur la discipline intérieure des corps, lors de la séance du 14 septembre 1790. In: Archives Parlementaires de 1787 à 1860 - Première série (1787-1799) Tome XVIII - Du 12 aout au 15 septembre 1790. Paris : Librairie Administrative P. Dupont, 1884. pp. 751-752
    corecore